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Designed to assess consistency in teacher judgient of 
student essays and to assess conformity of teacher judgient vith 
expert judgment, the Composition Rating Scale (CRS) requires the 
taker to rank-order five brief compositions^ Requiring tirenty ainu^:es 
to complete, the scale can be used to evaluate the consistency of 
teacher judgments of compositions, to screen lay-composition readers, 
or to prepare student teachers. [This document is one of those 
reviewed in The Research Instruments Project (TRIP) monograph 
^'Measures for Research and Evaluation in the English Language Arts*< 
to be published by the Committee on Research of the National Council 
of Teachers of English in cooperation vith the ERIC Clearinghouse on 
Reading and Communication Skills* A TRIP reviev vhich precedes the 
document lists its category (Teacher Competency), title, author, and 
date, and describes the instrument's purpose and physical 
characteristics. ] (RB) 
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The attached document contains one of the measures reviewed 
in the TRIP committee monograph titled; 

Measures for Research and Evaluation 
in the English Language Arts 



TRIP is an acronym which signifies an effort to abstract 
and make readily aveilable measures for research and evalua- 
tlon In the English language arts. These measures relate to 
language development, listening, literature, reading, standard 
English as a second language or dialect, teacher competencies, 
or writingt In order to make these Instruments more readily 
available, the ERIC Clearinghouse on Reading and Communication 
Skills has supported the TRIP comxiilttee sponsored by the Committee 
on Research of the National Council of Teachers of English and 
has processed the material into the ERIC system. The ERIC 
Clearinghouse accession numbers that encompass most of these 
documents are CS^0/S^^ -CS A^/J7'S' . 
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CatejiOfy: Tcochor Competency 
Title! Composition Hating Scale 
Auihor: Vt-rnon Ih Smiili 

Description o£ the Instrument: 

Put pose ; To assess consistency in teacher judgment of essays and 
to assess conformity of teacher judgment with expert judgment, 
pat e of Cons Lruct Ion : 1966 

Physical Description ; Hie CRS requires the taker to rank-order 
five brief compositions, A simple and efficient scoring scheme is based 
on dcviiltion from experts' ranking of the same compositions. The test has 
two forms • . 

Requiring twenty minutes to complete, it could bo used in 
studios where evaluating the consistency of teacher judgment of com- 
positions is important. It could also be used to screen lay-compo6it ion- 
rcadcr applicants. With the outside criterion (exports' rankings) and 
with the ease of comparing judgments within a teacher group, it could be 
useful for teacher training. 

Validity, Reliability, and Normative Data: 

The best evidence offered by the author for the validity of the 
test is the high degree of agreement among the experts who determined the 
final ranking (for scoring pui|joses) of the essays. Intcrrater 
reliabilities were .92 and .85 for two administrations of Form A and ,88 
and ,84 for two administrations of Form B. The test-retest reliability 
of the exports on each form was l.OO, 

The basic validity question, of course, is whether the teachers' 
judgment and ranking of the five test compositions is very similar to 
the judgments they make on actual compositions. No evidence is reported 



on that. Since Llie test co»iposil;lons arc liTiiitccI to only one kind of 
wrttlng**n brief i pcrsonfil letter in narrative form to a pen pnl--the test 
does not iissess teacher jvidRmont of other kinds of writing. 

The rella- lllty coefficient from scores on both forms by teachers 
was .61 • The teot-rctcst reliability was and .79 for Form.9 A and B 
respectively. Whoa the two forms were considered together as a larger 
ten-item test, the test-retest reliability rose to ,87, The autlior con* 
eludes that "the ir.ost reliable results will be obtained v;hcn the two forms 
of the i:est are given at the same time and the Scores on each are combined 
to give a total score." 

Ordering Infornjation: 
EDRS 

Related documents* 

More information (and Form A) of the test is available ia Vernon H. 
Smith, "Measuring Teacher Judgment in the Evaluation of Written Composition," 
Research ii^ the Teaching of EnfiUsh, 3 (Fall 1969), 181-195. 

Thomas E. Whalen, "A Validation of the Smith Test for Measuring Teacher 
Judgnent of Written Con^)osition," Education , 93 , No. 2, 172-175. 



Vernon H. Sttlth 
"Coapoeltlon Rating Scale" 
1966 




BESEABOH IN THE 
TEACHING OF ENGLISH 

VOLUME 3, NUMBER 2, FALL 1969 



Contents 



ARTICLES 

127 Drama in the secondary school: 
A study of objectives 
by James Hoetkerand Richard Robb 



166 



178 



ISl 



196 



209 



ERIC 



Untlerslanding aliusioas in literature 
by Marforie Poberts 

The effects of prereading assistance 
on the comprehension and attitudes 
of good and poor readers 
by Richard /. Smith end Karl D, Hesse 

IT A and TO training in the develop- 
ment of children's creative writing 
by Joanne A. Auguste and Fredrlc B. Ndven 

Measuring teacher judgment in 

the evaluaUon of written composition 

by Vernon H, Smith 

Teaching punctuation in the ninth grade 
by means of intonation cues 
byJeanetieR. Held 

An experimental comparison of writing 
achievement in English composition 
aud humanities c!a$:>ei 
by William C.Budd 



■PERMISSION TO RfcPROOUCE TmiS COPV- 
flfGMTEO f^.»TtntAL HAS BEEN GRANTED dY 

National Council of 
Teachers of English 

TO €ftiC ANO OWSANlZATiONS OPERATING 
UNOEP AGREEMENTS WiTM THE NATIONAL IN- 
ST/TuTE Of EDUCAnON. fURTHER REPRO- 
DUCTION OUTSIDE THE tPtC SYSTEM PE- 
OumES PERMISSION Cr THt COPYftiGHT 
■ OWNER" 



Mr. Smith explains fiou> he developed and vaH- 
dated two fonns of a composition scale to test 
teacher evaluation of elementary wrtiing. After 
trying out the test m various groups of prospeC" 
tive, beginning, and experienced teachers, he 
conchdes that valid judgment of the quality of 
elementary writing is independent of experi- 
ence, academic preparation^ and professional 
training. 

Measuring teacher judgment in 
the evaluation of written 
composition^ 



VERNON H. SMITH 
Indiana University 

We have known for a long lime that raters would not alway's 
agree on the value of a particular composition. 

It is common h\owIedge to student and teaclier alike that the 
'yjiding of essay material!* can bo highly inconsistent. The grade 
^veri to an Engb'sh theme may vary considerably among differ- 
ent raters and even with the same rater at different times.^ 

Perhaps the most dramatic evidence of this lack of agree- 
ment came from the study by Diederich, French, and Carl- 
ton.^ In Research in Written Composition this study is sum- 
marized as follows: 



1 This article is based on a doctoral study completed Ui 19^% under 
the direction of Professor Harold M, Anderson at the University of 
Colorado and on subsequent research &tiU in progress by the author. 
The dissertation, titled An investigoUon of teacher jvdgmeni in the 
Mtuotion of mUten composition including the development of a test 
for the meiisurement thereof. Is available from University Microfilms, 
Ann Arbor, Michigan (Order No. ffMO, Oil), 

^ J. C. Follmari and J. A. Anderson, "An Investigation of the reliabil- 
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They (Dlcderich, French, and Carlton) analyzed the way ten 
English toaclicrs rated 300 two-hour compositions by college 
freshmen In comparison to 43 other raters: social scientists, 
natural sctenttsts, writers and editor?, lawyers, and business 
executives. The raters were given no standards or criteria for 
judging the papers, merely asked to sort the themes into nine 
piles in order of general mcrl(» with not less than 4 per cent of 
the papers in any pile. It was "disturbing to find that 94 per 
cent of the papers received either seven, eight, or nine of the 
nine possible grades, that no paper received less than five dif. 
ferent grades, and that the median correlation between readers 
was ,3L Readers Jn each field, however* agreed slightly better 
with the English teachers than with one another.''^ 

While a number of studies have shown similar disagreement 
among theme raters, the Diedertch*French*Carlton research is 
impressive because of the number of raters, the t)umber of 
themes, and the magnitude of disagreement The themes In 
this study were written by college freshmen. Other studies in 
this area have focussed on themes written by high school or 
college studt^nts. Studies below the high school level are rare, 

Paul Diederich once summed up liis Investigations of the 
rating of compositions by English teachers as follows: 

The average commentary on teachers* comments need not be 
quite so brutal as this, for I have compressed into one para- 
graph a large number of flaws that 1 have found In many 
samples of papers marked by teachers that I have examined 
fn research studies. I hate to say It, for I am kindly disposed 
toward all English teachers, but the dominant impression left 
by these studies is that the average English teacher, both in 
high school and in freshman composition courses, is barely 
literate, capridm h iudgmenf, full of prejudices that have no 
basis in anyone's system of grammar, rheioric, or style, hard to 
dedpher, eager to misinterpret, and giver? to comments tliat have 
no comv^on vn'th anything the student has written. . . (italics 
. mine) _ - " -.^ 
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ily of five procedures tot grading EnglUh tnemes," Research in the 
Teaching cf EnglUh, 1907, 1, 190. 

ap. B. Diederich, j; W, French, and S. Carlton, Fcctots in fudt* 
jM^fttf of writing cbiiity (Research Bulletin RB6M5, Princeton. U lt 
ETS, 1661). . ' 

*B. Braddoclc, jR. Lloyd-Joties, and L. Schoer. R6*<^af<?^ In wtitm 
c<mipoiitiof\ (Champaign, UK? NOTE, 1063), p. 4i; 

5 P. B; Diederich, "The problem of grading essays*" (Princeton, U.U 
ETS, 1957, pp, 7-S. Mirticogmphed ), \ / 
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Ccprldous When the Diedcrich-Frcnch Carlton report was published 
Rating in 1961, 1 was the K- 12 supervisor of English in a large sub^ 
urban school district. 1 wondered whether the same ^'capri- 
cious Judgment found among high school and college English 
teachers could exist among teachers at lower giade levels. It 
might be very difficult for a tliird grader, or a fifth grader, 
or an eight grader to develop his composition skills if his 
teacher rated his themes high one year and if another teacher 
rated them low the next year. Is it possible for a student to 
get a teacher ^vhose judgment is contrary to that of most other 
teachers? 

Armed with curiosity and some themes ! had borrowed 
from a fifth grade tcadier, I attended a meeting of Uachers 
from grades one through nine. Since the purpose of the 
meeting was to discuss the teaching of composition^ the teach- 
ers were willing to participate in the little exercise I had pre- 
pared. Each teacher w?s given copies of the same seven short 
themes and asked to pick the tNVo best and two worst. The 
fiftli grade teacher and I had already picked a couple wo 
thought were good, a couple wo thought were poor arid thre<^ 
that were in between. Although the majority of the teachers 
at the meeting agreed with our initial choices, each of the 
seven essays was picked as one of the best and as one of the 
worst by some of the teachers present. The results indicated 
that Diederich's ''capricious judgmenr was not restricted to 
high school and college English teachers. This was the begin- 
ning of an investigation into teacher judgment in the evalua- 
tion of written composition. Although further research Is still 
in progress, the purpose of this article is to summarize the de- 
vebpment of a test for measuring teacher judgment in evalu- 
ating themes and to summarize the results of the administra- 
tion of that test to a sample of ahnos 1 200 teachert from grades 
one through twelve, , . 

. , CorP 'mon Although this investigation k llOl directly <»nce^ with 
Rating ScaUs composition rating scales, a rcporc by FoUman and Andersori 
comparing five evaluation scales clarified the relation between 
teacher Judgment and the use of such scales.^ Their study 
compared four formal procedures and the *'Everyman*s 
Scale,** an informal procedure by which the rater is instructed 
to use his osvn Judgment In rating a set of themes. The dl- 
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6 Fcllman gndl Andersoo, op, c^l. 
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pt^Wh? Scale- include the folloM^g 

There Is no parUcular grade that eac}* essay should receive Y«,. 
evaluate each essay according to your o«)n /uSn « to wh'^ 
comUtutes Writing ablUty. Use your own uSgS ab^ut th 
writing abihly as indicated by each essay. Don't use an/svstl 
other than your own fadgmcitJ (ftalics^^tne) ^ ^ ^ 
The results of iho Follmaai-Anderson study Indicate that 
SfIciS;T'"\^S'f '''^""^^P^^'^^'y higj; reliably t 

repeat the last three sentences from the paratjraph ouoted 
above and then add thlr- comment. P^^^i^^apn quoted 

oe^i't ilTZ^^^, e^//Won is that such instructions would 
„^ ! i difference and inconsistency among the 

It may now be suggested that the unreliability usuaUy obtained 
toa iSetS"^^ essav, occurs primarily Lause ra'^f^l 
and SSr"^'^'" heterogeneous in academic background 
Iv to^rl^,t« 1"'" ^"P^ri^""*! backgrounds which arV^S^" 
ly to produce different attitudes and values which ooerate sK 
mflcantly in their evaluation of essay.,. The funcSn oUt 

i f * percepHon and gives direction to his atti- 
EokfoTanS In Other words it ^Ints out Nvhat he louTd 
iook for and guides /lij /urfgm^nM (jtalics p,ine) 

niEPnOM.. -5uS:sS^^^ - When an 

niEPROBLEM JJisinvesUgadoni co'ncernedwithb 

SlK>^^Z?='m : : 

1- Can /udgment in the evaluation of written oom^^^ 
be^measured validly, efficiently, and reliably? ^ ^^^^ -^ 
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METHOD 
Test 

Det>elopment 



3, To what extent is there agreement in {udgment among 
teachers of composition at Ihrco levels: elementary, junior 
high, and senior liigh? Since the high school and junior high 
teachers in the study have considerably greater academic 
background in English than the elementary teachers, another 
question is included in this one— Is academic background in 
English a factor in judgment? 

4, How does the Judgment of teachers at these three levels 
compare with that of experts? 

5. How docs the judgment of prospective and beginning 
secondary English teachers compare with tliat of the experts 
and with that of experienced secondary English teachers? Js 
teaching experience a factor in judgment? 

6. How does the judgment of a select group of nonteachers 
compare with that of the experts and with that of secondary 
English teachers? Since the nonteachers in the study had aca- 
demic backgrounds in English that were similar to those of 
the secondary English teachers (most of the nonteachers be- 
ing English or Journalism majors), this question also explores 
the possibility of a factor in judgment related to methodology 
or teacher education. 

7* Are there teachers in any of the groups whose judgments 
are contrary to that of the experts and to that of the majority 
of other teachers? 

Questions three through six were rewritten as twelve null 
hypotheses for testing. 

The idea for the test originated with some work with the 
STEP Essay Test^ in a pilot testing program in several ele- 
mentary schools. Several sarnples of student written responses 
on the STEP Essay Test were used to screen and train lay 
riders, some of whom were to read and score the STEP es- 
says. To make the selection of readers more objective, a scor* 
Ing system was used based on deviations from the scores as- 
sign^ by a small group of classroom teacher i according to 
the suggested method for scoring these tests-^*^ 

Since the stidenls w^^^ 
samples were relatively long, and scoring by mder applicants 
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^ Sequential tests of edutaticnat progress, essay test, form 4 A 
(Prineelort, N.J.J ETS, 1057). 

i^STE^ handbook jof essay tests, Itoel 4 (Prino^toft, NJ.r ETS, 
1057). 
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took somo dmo. Tho 12 samples used originally were re- 
duced to seven and later to five by eliminating those that had 
little discriminatoiy power, i.e., those that ahnosl everyono 
agreed upon. 

While the instrument was still in this preliminary stage, a 

variation of it was used in some inservice meetings with sec- 
ondary English teachers. Somo excerpts from the seven and 
later from the five samples were duplicated and given to 
teachers who were asked to pick the two best and two worst 
Then tho teachers were given the opinions of the elementaiy 
teachers on these samples, and a lively discussion usually fol- 
lowed. 

Eventually some other shorter samples of writing from some 
other fifth grade classrooms were collected, and out of the new 
samples plus excerpts from the original samples two forms of = 
the test were developed.^i Each form of the test consists of ' 
five samples of writing which are to be ranked from best to ; 
worst, When either form is ^ven to 4 group of 30 to 50 
teachers, each sample is usually ranked as best by some and i 
as worst by some. i 

The Scaring Each form of the S-item test now consists of two essays that | 
System are ranked high (better) by a majority of teachers and two j 
that are ranked low (worse) by a majority and one that is in 
between. After experimenting with several complex methods - 
of scoring, none of which was $atisfa<5t6ry, the writer de- 
veloped a simple easy^o-score system. 

When either of tlie two *1t}elter^ essays is rank^ flret or > 
second, it scores ono point When either of the tv/o Versa** ! 
essays is ranked fourth or fifth, it scores one point The middle I 
essay scores one point If it is ranked second, third, or fourth. 
Possible scores range from zero to five. | 

The Sample The sample population included over 200 subjects who A 
Popuhtlon came from three sources: classroom teachers in the Jefferson ' 
County, Colorado, Public Schools; students in undergraduate ■ 
and graduate classes in English and education at the Univer- 
rity of Colorado; and a select group of nonteachers who were : 
composition readers or composition reader applicants in this 
same school district 
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"Form A of the test Is toduded ai an ipnefldit to this •ride. Copy- 
right, 1986, Vernon H. Smith. 
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Administration 
of the Test 



Statistical 
Procedures 



The dislribulion of the sample population was as follows: 

High School English Teachers 54 

Junior High Sc\k>o\ English Teachers 44 

Elementary Teachers (Grades 1*6) 32 

Prospective and Beginning Secondary English Teachers 61 

Nonteachers 27 

Total 218 

The experts in the study were five secondaiy English 
teachers who had been formally recognized as outstanding in 
the teaching of composition within their school districts or by 
some outside agency. 

The groups that served as subjects in the development of 
the test, the sample population, and the experts were mutually 
exclusive. 

The test was administered to the sample population In vari- 
ous groups. The experts took the test individually, usually by 
mall 

The directions are designed to be self-explanatory and to 
give no Information that might bias the testee in ranking the 
essays. The following directions appear on each test: 
Below are five themes written by students In the same class. 
Using your usual criteria for evaluating written work, rank the 
five selections In order. Put a 1 in the blank to the right of 
the composition that you consider best, a 2 by the second best 
and so forth on to 5 which would Indicate the worst 
The administration time for either form of the test is from 
10 to 20 minutes, 

In the testing situation nothing was said or discussed that 
would affect the results of a second administration of the test, 
The significance of the agreement of the five experts was 
determined by Snedecor*s formula for intraclass correlation, an 
application of analysis of variance for a small group of raters 
recommended by Ebel in a study of various formulas for 
intraclass correlation.^- 

AII other statistical procedures used can be lound in stand- 
ard statistics textbooks. The significance of differences among 
and between the subgroups in the sample population and the 
experts was delermined by analysis of variance. The Pearson 
product-moment method was used to determine the test- 
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l> R. L. Ebel, "Estiinjktion of the relJabilily of ntinsi,^ Ptyehometrtka, 

1651, J6. 407-424. W:M&W:ffWW^^ 
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mc$t reliability coeffldOflU and the reliability coefficient b<s 
twcen the two torm$ of the test. 

the relevance of the lest Is based on the assumption that 
the Judgment used in ranking the five assays Is the same or 
rimilar to the judgment used by teachers In evaluating stu* 
dents* Wiittcri compositions, to the exlent that this assumpHon 
is true, and only to that extent, the test has logical relevance* 
Whether this assumption is true or not, U Is a common as* 
sumption underlying attempts to make the evaluation of essay 
tests and compositions more consistent and reliable. 

The construct validity of the test Is based on the agreement 
and consistency among the five experts. It \vas hypothesized 
that if the lest v^ere vaUd, experts in the teaching of composi. 
Hon would agree in ranking the essays, and that their judgf 
ment on two administrations of the twt would be stable, 

the experts took both forms of the test twice with an intern! J 
of six to ten weeks between administration$vTh6 reliability of ^ 
the scores of the experts is an essential part of the validity of 
■the test • 

Using Snedecbr's formula for intraclass correlation,^^ the 
interrater reliabilities for the experts on Form A first and 
second admini^tlons were ,920 and .850, respectively, with 
reliabilities of average ratings .083 and .966. For Form B the 
Interrater reliabilities were .880 and .840 with reliabilities o( 
average ratings of >9f3 and ;i963. the coefficient pf stabililj^l 
tesl-retest reliability, for the experts on each form of the t(^t ^ 
was 1.00. 

The reliability of the test was determined by administering; 
both forms of the test to two selected groups twice, Thd^| 
groups selected were students In two summer school clasps ia^ 
the University of Colorado, one in Teaching Reading in the 
Secondary School, the other in Teaching Literature to Adoles- 
cents. These two classes were selected because they enrolled 
a number <rf English tea^ and because they woire n^^^ 
courses that would Intentionally affect the teachers* evaiuX*}prt 
pf written comjposi tlon. two test sessions were fou^ ?W6|jii: 
and two days apart. The test was not discussed In either class 
before or after either session. The students were told only that 
■liifey^\Vel:e'as^Ut^^ ; ' 

The coefficient of equivalence between Forms A||||||| 



i^ tbid^ p. 411. 
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wa5 determined by using the results from the first adminlstra* 
tioa of both forms. The coefficient of equivalence between 
forms was ,607, The coefficients of stability for the two forms 
were determined by the tost-retest method. The coefficients 
of slability were ,739 and .790 for Forms A and B respectively. 
A third t)Tpe of reliability coefftclent» the coefficient of equiva- 
lence and stabiUty> was determined by finding the correlation 
between the first administratioa of Form A and the second 
administration of Form B. The resulting coeffident was ,679. 

While reliability coefficients from ,60 to .79 tvould not be 
highly regarded in the field of objective testing, they would 
certainly be considered significantly different than zero. This 
test attempted to measure judgment, a subjective factor, and 
these coefficients compare favorably with other studies of 
rater reliability. 

There are two possible explanations for these relatively low 
reliabilities^. Teacher judgment b the evaluation of written 
composition, the factor which the test attempted to measure^ 
may not bo very reliable. Diederich gives an example of read- 
ing a set of papers after an interval and finding a correlation 
of only .54 with his own first reading,^* 

The other possible explanation Is that in creating a rela- 
tively simple, short instrument, length which ntight have im- 
proved reliability has been sacrificed, Tp test the latter a 
fourth reliability coeffident was calculated. The two forms 
were considered as one longer test of 10 items, and the test- 
retest reliability, the coefficient of stability, Avas .870, There- 
fore, the most reliable results will be obtained when the two 
forms of the test are given at the same time and the scores on 
each are combined to give a total score. 
Results of The consensus of all of the teachers in the sample and of 
0^ Sample all of the persons in each subgroup agreed with the consensus 
PopuiaHdn of the experts on the rank for each item on both forms of the 
test. However, there was much greater variance among the 
Individual subjects in the sample population than among the 
experts. Analysis of variance indicated that all teachers in the 
sample and each of the five subgroups-nonteachers, elemen- 
tary teachers, Junior high English teachers, high school English 
teachers, and prospective and beginning teaichers-^differed sig- 
nificantly (at the .025 or Ol levels) from the experts, Therej 



U p, B. DiedericK op- cit* p. 6. 
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were no significant differencos (.05 level) among the five 
subgroups. V 

Individual test scores on Form A or B were interpreted as 
follows; (1) A score of 4 or 5 indicates agreement with the 
experts in Judgment in the evaluation of composlUons as 
measured by the test. Since the consensus of all teachers In 
the study agreed with the experts, the same scores indicate 
agreement with the consensus of composition teachers at all 
levels/grades one through tAvelm (2) A score of 0, i* 0^ 2^ 
indicates Judgment that is contraiy to that of the experts and 
to the consensus of composition teachers at all levels. To get 
a score o^ 2, an individual has ranked three of the five themes 
in positions contrary to the ranking of the experts. (3) A score j 
of 3 falls between the two extremes and indicates borderline 
or marginal agreement. 

When the above Jnterpretatlon of IndlviduaV scores was 
used, almost half (48$ on Form A; 41% on Forni B ) of the 
teachers in the sample agreed with the experts. Howover, more 
than half did not agree. On each form approximMely 551$ dis- 
agree ot'^i'O borderline. Twelve per cent on Form A and 19$ on 
Form B disagree oir havd Judgment that is contra:iy to that of ; 
the experts. If the Judgment of the experts as defined and 
measured in this study Is accepted, then these persons are not 
competent to nfiake such judgments, Thes^ results indicate an 
mipleasant situation for the chM as he Jeams to write. His 
chances of getting a teaeher whose Judgment does not agree 
with the judgnaent of the experts are slightly greater than even. 
The chance that he will get a teacher whose judgment is con- 
trary to that of the experts and to the consensus of other 
teacheis of composition is about one In six; that is, he might 
expect two such teachers in his public school years. 

All subgroups differed significantly from the experts, In 
each group there were more persons who agreed with the ex* 
perts than who were borderline or disagreed, but the com* 
bined total of those whose Judgment was borderline and those 
whose Judgment was contrary was greater than the number 
agreeing with the experts. 

There were no slgo*{icant differences among the five sub- 
groups. The variations within each subgroup were much 
greater than the variations among the five subgroups. A dlf- 
fctence between elementary and secondaty teachers would 
have suggested that academic preparation wa$ a possible 
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facto; In Judgment as measured by the test. A difference be- 
tween prospective and beginning secondary Knglish teachers 
and experienced secondary English teachers would have sug- 
gested that experience was a possible factor in judgment. A 
difference betNvecn nonteachcrs and teachers would have sug- 
gested that professional braijiing was a possible factoi in judg- 
ment Since none of thes-> differences was found, judgment as 
measured by tlie test may be independent of experience, aca- 
demic preparation, and professional training, 
CONCLUSIONS 1. The test results indicate that the subjective judgment of 
teachers in evaluating a specific set of short written composi* 
tions can be measured, and the results can be treated statis- 
tically. 

2. Among experts in the teaching of composition, agreement 
to judgment, as measured by this test, does axist and is re- 
liable. 

3. Judgment, as measured by this test, is not related to ex- 
perience, academic background, or professional training. 

4. Although the consensus of teachers on any one item on 
the test agrees with the judgment of the experts, more than 
half of the teachers do not agree with the experts in judgment 

• as measured by this test 

6, A significant number, between 10 and 20?, of classrooni 
teachers charged with the responsibility for teaching students 
to write in grades one through twelve have judgment, as 
measured by this test, that is contrary to that of experts in 
the teaching of composition. 
DISCUSSION i he development of this test and the subsequent research 
reported herein represent an exploratoiy probd into an area 
where little prior measurement had been attempted. The re- 
sults reported here should be considened tentative until a body 
of research in thos area becomes^ 

There may be some questions about the nature and quality 
of the writing samples used in this study. The samples from 
the STEP Essay Test were used because they were available* 
Their use should not be regarded as an endorisement for the 
use of such mundane writing assignments by classroom teach* 
ers. Teacher judgment in the evaluation of more imaginative 
writing may be more complex and even more subjective than 
it was on the samples used in this test In addition, the range 
of the five samples on any form of the test would not be at all 
typical of the range in any classroom. Obviously superior and 

ERiC 
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obviously poor samples could not used tn the test bo(^u$o 
they failed to dlscrimlnalo weU. The five sample on each 
form were picked on their discrimination value as test Items, 
not on their literary merits. 

The suggestion by FoUmau and Anderson that high relia- 
bility b the uso of rating scales may bo a measure of the 
homogeneity of the background of the raters rather thaa a 
measure of the scale poses a similar question with regard to 
this tcsv.^5 Teachers in one largo school district and under- 
graduate and graduate students in a few olass^ In one unl^ 
verslty could certainly be considered mote homogeneous than 
a general population. Replication wilh a more heterogeneous 
population would be helpful.*^ 

This writer e?tpected to find differences in Judgment due to 
experiencOi aoademip background, and professional ti^ainlng. 
The acceptance of the nuU bypothesei^ in the$e areas does not 
necessarily mean that such differences do not exist They may 
exist, but they were not measured by this test 

This test should not be used to evaluate teacher competence 
lu the teaching of composition. Too many other factors-In^ 
dudlng motivation, inspiration, and teaching teohntques-play 
vital roles in teaching composition. 

As students mature^ essays and essay assignments are nor- 
mally longer and more complex, Judgment on longer essays 
may be subject to greater variance than judgment on the rek* 
lively short samples in this test This may account for the 
greater variance usually found among raters on themes at the 
college level 

Wright and Rubenstein found that poor writers had Uttle 
ability to discrlmlnJite among compositions of varying merit 
In the same study, rank order assigned by good writers was 
* close to that assigned by faculty members. These results sug- 
gest that judgment of written composition and writing abilit)' 
may be related. This test could be used to determine the cor- 



1? Fo)lrnjn aad Anderson, op. c<<. 

f^the author wishes to entourage rtpUcaUon tod furth^ research. He 
wdl provide additional inforrnation, addLtional Cdrms of th« test as \hty 
are developed, and leering tAsfaructions. Interested persoos iboutd svrite 
to Veition H. Smith, School of EducatJon, IrtdUnA University, Bloom- 
Ington, IndUrta 47401* 

I'' R. L. Wright aad H. Rubenstefft, **C^n college ttudcDt* recognize 
good vi^Mngy* (Michigan Sttte) CoiUge of Ed%^i\Of\ QuariMy, 
C, 11-20. 



relation. If any, between Judgment and writing ability by 
giving the test to a group of subjects irom whom writing 
samples were collected. If a significant correlation were found, 
follow-up studios might determine the effects of training In 
cither area on the other. If some typo of instruction In evalua* 
tlon produces an improvement In writing ability^ both teachers 
and students would benefit. 

While this study investigated Judgment, It made no attempt 
to InvesUgate changes fa) judgment. Change in judgment under 
specified conditions offers opportunity for further research- 

It is Impossible to discuss the results of this study without 
confirming the lack of basid research in the teachta 
ten composition and in the evaluation of written composition 
which was pointed out by Braddock, Lloydijones, and Schoer 
in Research in Written Comj)Os{tionM 
APPLICATIONS The test developed in this study could be used for Oie fol- 
lowing purposes: 

1. to provide individual teachers and prospective teachers 
with knowledge of their judgment in the evalu^^^^ 
compositions; 

2. to focus attention on the problenis of evaluating written 
compositions by pro\'iding groups of teachers (secondary 
English departments, elementary school faculties, workshops, 
Inservice training sessions) ^vith a means of comparing Indi- 

. vidual judgments with other Judgments within the group and 
with an outside criterion; 

3. as part of a battery of tests to screen composition reader 
applicants; 

4. as a tool to screen raters in research when judgn-v'nt In 
the evaluation of written compositions is a factor. 

In addition, the measurement technique developed in this 
study nilght be applied to any area when subjective judgment 
b a fticlor in evaluation. 



FORM A* 

I. 

Dear Pen Pal, 

I am in the fifth grade thli year. I thiijk Ym a veiy ludcy 



IftR. Braddock, R. Uoyd-Jones. ind L. Schoer^ op, cii. 
•Directions ht the test are giveft b th« article above. 
Copyright, 1&66, VerooQ H. Smith. 
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boy* I have scvrail pels. I'm joJning 4-U with my horso this 
yoar/ 

there are four people in my family, Vm also kcky Tm tn 
this family. AVe do many things. Yesterday we went vp tn the 
mountfltlns to get poat-mosi 

I go to school at Lincoln, Where do you go to ischool? I 
have been lucky with teachers. 

In the summer we go water skiing and camping. We mostly 
go water skiing. Its a lot of fun, What do you do in the 
summer? 

Your friend 
... Eric ■ : ; 

.11. ,,' • • V. 

Dear Pen Pal, 

1 live In Denver, 1 Wee whwe I live. 1 go to Lincoln that 
is the name of my school. My name is Beverly. I would like 
know your name? I have one brother and no sisters. My 
monther works ale the Honeywell Plant and my dad workes at 
Dave Ck>oks. I am in the fifth grade. My teacher's name Is 
Miss Jones. My prlnceabulo is Mrs. Brown. On sAterdays we 
clean the house and, on Sundays we rest and mother and 1 
fix the diner. 

Your friend, 
Beverly 

m. ■ .\ :/■' ' 

Dear Pen Pal, 

My name 1$ leonard. You do not know me. I live in Colorado. 
My age is 10 years old. 

My family lives here with rae> but lay brother does*nt He 
lives in Texas, He works at a socket fuel plant, which is called 
Rockctdyne. My mother just started to work on Monday. My 
father Is a teacher. He teaches 7th grade geography. 8th grade 
American hiitory, and 9th grade civics. My sister is trying te 
get a job. 

My school Is called Lincoln. It is a very nice school. My 
teacher's name is Miss Jones. And we, that is the whole school, 
have the nicest princlpal ln the whole school district. Her 
name 1$ Mrs. Brown. 

When there is no school, 1 fust ride my bfcycle and play. I 
live in an apartment. There's a swimming pool but the 
manager closed It up so wo cant swim In it until the first of 
June. When school is out for the day, almost everytlme you 
see me Vm eadng popcorn or drinking root beer. Unless Tm 
doing something else. Th^re is a playground too at the aparl- 



roeots. There's a slide, a meny g6 round, 4 swings, and a 
Jungle jim. 

Very truly, 
Leonard 

iv;: 

Dear Pen Pal, 

_ I am a rirl. I live in the United States. My slate name Is 
Colorado I Uve in a suberb of Denver. I have blond hair and 
blue eyes, 

_ I jwve lots of pets I have a dog, cat, and bird, My dog Is a 
big dog. she Is a Siberen Huskey, she has lots of (m on 
her. When we pluck her we get bags and bags of fur, In 
Winter when snow falls we hook her to a sled with her hames 
on her back and she Is read to pull, Down the street my doe 
«nd I go. My cat is gray, she has had several litters of kittens 
when we take them to be sold we half to make sore she 
A)«nt see us or else she'll know that were taking them away. 
If she sees she will Jump in the car and when we get to the 
plaoa^ where we sell them she'll go where ever we go 
with her kittens and well loose her. We have this cat she is 
•bo Jt^seven yeare old and she Is called a Purshan cat. She is 
ft good cat 

Vour friend 
Diana 

v. 

Dear Pen Pal, ■ 

My name is Donnle, I am ten years I am 4 ft, 5" I have 
gieeo eyes. The physical teacher work with me kst year, I 
had lots of fun. He took it easy on me, because 1 had heart 
■ trouble, 

1 have a Mother and Father, two brothers one sister. We 
have a dog it's name .<$ Sklppy, It is a girl, black and brown, 
tt» paws are white. My mother iand father have black hair. 
One of my brothers and I have red hair. Our school had over 
tkee hundred people In it. This year I have nine classes 
alltogether. I have a very nice teacher, her name is Miss 
Jones, 

Oa week ends I go ov6r to my friends house, we ride 
Wcks, we play football, and we play with af<> <l<)gfc Mike W 
I have fun together, we ride ou-- bicks to the school, not Just 
to school but all over in our blocks. Oiie day I got a box of 
raisens, Mike and I split the box of ralsens. 

Your friend, 
bonnle 



